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Abstract: Every Organization need to understand their 
customer’s behavior, preferences and future needs, 
which depend on past behavior. Web Usage Mining is 
an active research topic in which user session clustering 
is done to understand user’s activities. In this paper, we 
use Neural based approach Self Organizing Map for 
clustering of session as a trend analysis with some 
parameters. It depends on the performance of the 
clustering of the number of requests. Here we are using 
SOM algorithm in Most Frequent Sequential Traversal 
Pattern Mining called SP-SOM and generated cluster of 
web data. In this research we establish good prediction 
with quantity of data and the quality of the results. 

Keywords - Web Usage Mining; Frequent Sequential 
Patterns; Sequence Tree; Web Log Data; Web Services; Neural 
Network; Clustering. 

L INTRODUCTION 

The WWW [2] is an immense source of data that 
can come either from the Web content, represented 
by the billions of pages openly available, or from 
the Web usage, represented by the register 
information daily collected by all the servers 
around the world. Web Mining is that part of Data 
Mining which deals with the extraction of 
interesting knowledge from the World Wide Web. 
Web usage mining [4] has many applications, e.g., 



personalization of web substance, support to the 
design, recommendation systems, pre-fetching and 
caching [23]. Kohonen Self-Organizing Maps 
(SOM) [8, 10] developed by Tuevo Kohonen, a 
professor emeritus of the Academy of Finland. 
SOMs learn unsupervised competitive learning. 
“Maps” is because they attempt to map their 
weights to conform to the given input data. The 
nodes in a SOM network attempt to become like 
the inputs presented to them. The topological 
relationships between input data are preserved 
when mapped to a SOM network. 

II. RELATED WORK 

Prefix Span [1], a more efficient pattern growth 
algorithm was proposed which improves the 
mining process. The main idea of Prefix Span is to 
examine only the prefix subsequences and project 
only their corresponding suffix subsequences into 
projected databases. The database projection 
growth based approach, Free Span [1], was 
developed. Although Free Span outperforms the 
Apriority based GSP algorithm, Free Span may 
generate any substring combination in a sequence. 

The projection in Free Span must keep all 
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without length reduction. 

In SPADE [11], a vertical id-list data format was 
presented and the frequent sequence enumeration 
was performed by a simple join on id lists. SPADE 
can be considered as an extension of vertical format 
based frequent pattern mining. The discovery of the 
user’s navigational patterns using SOM [8, 10] is 
proposed by Etminani. Huge amount of information 
are collected repeatedly by web servers and 
gathered in access log files. Analysis of server 
access data can offer important and helpful data. 
The author used the Coonan’s SOM (Self 
Organizing Map) to preprocessed web logs for 
extracting the common patterns. 

In WUM [14] we find the behavior of user either it 
is registered or not. If a website requires users to 
sign in before they can start browsing, it will be 
very easy not only to differentiate between users 
but also to identify each single user. The problem 
arises when a website allows visitors to 
anonymously browse its content, which is common 
place. In this paper differentiate between visitors 
activity as a challenging task in the log using 
common log format [6], 



III. PROPOSED WORK 

In this paper, we have used Self Organizing Map 
(SOM) with frequent sequential pattern. SOM is a 
type of neural network. In the process of Web 



Usage Mining [14] to detect user’s patterns it is 
usage as a trend analysis. It depends on the 
performance of the clustering of the quantity of 
requests. Here we are using SOM algorithm with 
SP-SOM (Frequent Sequential Traversal Pattern 
Mining with SOM) algorithm. 




Figure- 1: Proposed SP-SOM Approach 

The procedure details the transformations essential 
to modify the data storage with clustered in the 
Web Servers Log files [6] to an input of SOM. By 
proceeding this way, first we use SOM algorithm 
and getting some cluster of web-data. Here we load 
the web-data cluster, which is almost related to 
frequent pattern. After that we are applying min- 
max weight of Page in Sequential Traversal 
Pattern. Finally we establish good prediction with 
quantity of results. The figure- 1 shows the process 
of proposed work where it collect the sessional web 
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data and applying SOM after preprocessing [7]. 
Here it mine density based clustering and then find 
the closed frequent item from the sessions web data 
for getting useful information. 

3.1 SP-SOM Algorithm 

The proposed algorithm is used for finding most 
frequent sequential traversal patterns with clustered 
index. To handle the ordered problem, the SP-SOM 
first filtered frequent sequential pattern by using 
support with min-max or average weight parameter 
[12] of item. After that it uses neural network 
algorithm for clustering of index with similarity of 
object. At last it create most frequent sequential 
pattern tree [21]. This tree is create less candidate 
set and also uses to predict next item in caching 
[23], 

3.2 Procedure of Sequential Pattern with SOM 
Technique 

The procedure for constructing the pattern tree in 
the proposed system is as follows: 

Step- 1 : Collect the web logs of website. 

Step-2: Apply preprocessing [7] to get useful 
sessional web data. 

Step-3: Supply input as number of support by the 
user and checks Min-Max weight or 
Average weight of page and generate 
frequent sequential pattern. 

Step-4: Apply SOM algorithm in Sessional 

Frequent Sequential Pattern Item. So here 



Volume : 01, Issue : 06 (September - October 2015) 

each and every item belongs to at-least one 
cluster according to similarity. 

Step-5: Convert frequent sequential pattern into 
Frequent Sequential Pattern [18] and 
generate the Pattern-Tree for next item 
prediction. 

Step-6: Finally establish good cluster items for 
prediction into the caching to improve the 
quality of the results and response time. 

IV. EXPERIMENTAL RESULT 

This paper showing the result of clustered web data 
using SOM and also used frequent sequential 
pattern for pre-fetch the next item in cache as on 
the behavior of similar pattern access of user. The 
Table- 1 showing page details with Support and 
Min-Max weight range. 



S. No. 


Page 

ID 


Page Name 


Support 


Min. 

Weight 


Max. 

Weight 


1 


PI 


Books 


9 


2 


31 


2 


P2 


Electronics 


7 


3 


7 


3 


P3 


Cloths 


7 


4 


22 


4 


P4 


Jeweler 


6 


5 


9 


5 


P5 


Furniture 


6 


3 


10 


6 


P6 


Toys 


1 


1 


2 


7 


P7 


Root 


2 


1 


3 



Table-1: The example of page with weight range 



The Table-2 showing the Items details of every 
page which belongs to page. 
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.(PB^Sftemld 


Pageld 


Item in Page 


1 


1 


Item- 1 


2 


1 


Item- 3 


3 


1 


Item-4 


4 


2 


Item- 1 


5 


2 


Item- 1 


6 


2 


Item-2 


7 


3 


Item- 1 


8 


3 


Item-2 


9 


3 


Item- 1 


10 


4 


Item- 1 


11 


4 


Item- 1 


12 


4 


Item-4 


13 


5 


Item- 1 


14 


5 


Item-4 


15 


5 


Item-2 


16 


6 


Item- 1 


17 


6 


Item- 1 


18 


6 


Item- 3 



Table-2: The example of page with item 



The Table-3 shows the Running time (in ms) 
when we having different database record size with 
different supports. 



Support 
/ Size 


1% 


2% 


3% 


4% 


5% 


6% 


1033 


6817 


6708 


6723 


6505 


6474 


6708 


2040 


18408 


18111 


18252 


18158 


18376 


18470 


3050 


15865 


17953 


18565 


18764 


18451 


18487 


4010 


24310 


24688 


24927 


25205 


26365 


24949 


5030 


83413 


38079 


84630 


54116 


40731 


38391 



Table-3 : Running Time (in ms) with different size 

and different support 



The figure-2 shows the Running time (in ms) when 
we having different record size with different 
support. 




Figure-2: Running Time (in ms) with different size 

and different support 

The Table-4 shows the probability of occurrence of 
each item with different support. 





Item-1 


Item-2 


Item-3 


Item-4 


Support-3 % 


0.44 


0.19 


0.15 


0.22 


Support-6 % 


0.50 


0.17 


0.08 


0.25 


Support- 10% 


0.43 


0.20 


0.13 


0.23 


Support-20 % 


0.33 


0.00 


0.33 


0.33 



Table-4: Probability of Items with different support 

The Figure-3 shows the probability of occurrence 
of each item with different support. 



0.5 




Item-1 

Item-2 

Item-3 

Item-4 



Support-3 



Support-6 Support-10 Support-20 



Figure-3: Probability of items with different support 
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The Table-5 shows Info-Gain of Item with different 
support. 



The Figure-5 shows comparison between WSpan 
and SP-SOM algorithm. 





Item-1 


Item-2 


Item-3 


Item-4 


Support-3 % 


07.02 


06.08 


05.51 


06.51 


Support-6 % 


12.00 


10.34 


07.17 


12.00 


Support-10 % 


07.84 


06.97 


05.81 


07.35 


Support-20 % 


01.00 


01.00 


00.00 


01.00 



Table-5: Info-Gain of items with different 



support 

The Figure-4 shows Info-Gain of Item with 
different support. 




■ ltem-1 

■ Item-2 

■ Item-3 

■ Item-4 



WSpan vs SpSOM Algorithm 



90000 
80000 
_ 70000 
£ 60000 
'£ 50000 
J 40000 
| 30000 
“ 20000 
10000 
0 




Supports 



■- WSpan 



SP-SOM 



Figure-5: Comparison of WSpan and SP-SOM 
Algorithm with different support 

The figure-5 showing the comparison between 
WSpan [26] and SP-SOM algorithm. If the support 
either 1 or 6 the execution time of SP-SOM 
algorithm is less. Thus proposed SP-SOM 
algorithm is more efficient. 



Figure-4: Info-Gain of items with different support 

The Table-6 shows the comparison between 
WSpan and SP-SOM Algorithm with different 
support. Here record size 5030 is taken in the 
database. 



Support-^ 


1% 


2% 


3% 


4 % 


5% 


6% 










8339 




8399 


WSpan 


83413 


83210 


84630 


7 


84645 


0 










5411 




3839 


SP-SOM 


82009 


83100 


80745 


6 


40731 


1 


Improvement 














Execution 


2% 


0% 


5% 


35% 


52% 


54% 


Time (in ms) 















Table-6: Comparison of WSpan and SP-SOM 
Algorithm with different support (By using Record- 

Size 5030) 



V. ANALYSIS AND PERFORMANCE 

EVALUATION 

In this section, we present performance study 
over various datasets (e.g. 1000, 2000, 3000, 4000 
and 5000 sessions) and also with different support 
(e.g. 3, 6, 10 and 20). The experimental results 
explored for the performance of SP-SOM with a 
recently developed algorithm, WSpan [1], which is 
the fastest algorithm for mining sequential patterns. 
The main purpose of this experiment is to 
demonstrate how effectively the sequential 
traversal patterns with min-max weight constraint 
can be generated by incorporating a support and 
weight page with clustering. First, we shows how 
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IJlfiJETifitRGof sequential traversal patterns can be 
adjusted through user allocate weights, the 
efficiency in terms of runtime of the SP-SOM 
algorithm, and the quality of sequential traversal 
patterns. Second, we show that SP-SOM has put 
related items in cache. Third we are using web 
services which provide automatically update min- 
max weight of every page in every fifteen days. It 
is also decrease back and forth time while finding 
next page from cache because it also store related 
page prior in cache [23]. 



while find the next item in traditional approach. 
Here we clustered the items so that clustered items 
are only scan not whole database. Third we use 
min- max weight and support of every page so that 
every page having different importance. So it is 
enough to perform extremely computationally 
expensive operations in a relatively short amount of 
time for finding next page prediction. . 
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